Overview

Dataset statistics

Number of variables16
Number of observations4238
Missing cells1883
Missing cells (%)2.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory501.2 KiB
Average record size in memory121.1 B

Variable types

Categorical8
Numeric8

Alerts

currentSmoker is highly correlated with cigsPerDayHigh correlation
cigsPerDay is highly correlated with currentSmokerHigh correlation
prevalentHyp is highly correlated with sysBP and 1 other fieldsHigh correlation
sysBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
diaBP is highly correlated with prevalentHyp and 1 other fieldsHigh correlation
diabetes is highly correlated with glucoseHigh correlation
glucose is highly correlated with diabetesHigh correlation
age is highly correlated with AgeBucketHigh correlation
AgeBucket is highly correlated with ageHigh correlation
AgeBucket has 1883 (44.4%) missing values Missing
cigsPerDay has 2144 (50.6%) zeros Zeros

Reproduction

Analysis started2022-09-08 02:24:21.329591
Analysis finished2022-09-08 02:25:09.655145
Duration48.33 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

male
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
2419 
1
1819 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
02419
57.1%
11819
42.9%

Length

2022-09-08T08:10:10.026000image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-08T08:10:10.320275image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
02419
57.1%
11819
42.9%

Most occurring characters

ValueCountFrequency (%)
02419
57.1%
11819
42.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02419
57.1%
11819
42.9%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02419
57.1%
11819
42.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02419
57.1%
11819
42.9%

age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct39
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49.58494573
Minimum32
Maximum70
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2022-09-08T08:10:10.594268image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum32
5-th percentile37
Q142
median49
Q356
95-th percentile64
Maximum70
Range38
Interquartile range (IQR)14

Descriptive statistics

Standard deviation8.572159925
Coefficient of variation (CV)0.1728782758
Kurtosis-0.9896358464
Mean49.58494573
Median Absolute Deviation (MAD)7
Skewness0.2281457773
Sum210141
Variance73.48192578
MonotonicityNot monotonic
2022-09-08T08:10:10.890429image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%)
40191
 
4.5%
46182
 
4.3%
42180
 
4.2%
41174
 
4.1%
48173
 
4.1%
39169
 
4.0%
44166
 
3.9%
45162
 
3.8%
43159
 
3.8%
52149
 
3.5%
Other values (29)2533
59.8%
ValueCountFrequency (%)
321
 
< 0.1%
335
 
0.1%
3418
 
0.4%
3542
 
1.0%
3684
2.0%
3792
2.2%
38144
3.4%
39169
4.0%
40191
4.5%
41174
4.1%
ValueCountFrequency (%)
702
 
< 0.1%
697
 
0.2%
6818
 
0.4%
6745
1.1%
6638
 
0.9%
6557
1.3%
6493
2.2%
63110
2.6%
6299
2.3%
61110
2.6%

currentSmoker
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
2144 
1
2094 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
02144
50.6%
12094
49.4%

Length

2022-09-08T08:10:11.180023image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-08T08:10:11.440236image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
02144
50.6%
12094
49.4%

Most occurring characters

ValueCountFrequency (%)
02144
50.6%
12094
49.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02144
50.6%
12094
49.4%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02144
50.6%
12094
49.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02144
50.6%
12094
49.4%

cigsPerDay
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct34
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.00308862
Minimum0
Maximum70
Zeros2144
Zeros (%)50.6%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2022-09-08T08:10:11.677996image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q320
95-th percentile30
Maximum70
Range70
Interquartile range (IQR)20

Descriptive statistics

Standard deviation11.87923021
Coefficient of variation (CV)1.319461655
Kurtosis1.051073151
Mean9.00308862
Median Absolute Deviation (MAD)0
Skewness1.252198516
Sum38155.08957
Variance141.1161104
MonotonicityNot monotonic
2022-09-08T08:10:11.934363image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=34)
ValueCountFrequency (%)
02144
50.6%
20734
 
17.3%
30217
 
5.1%
15210
 
5.0%
10143
 
3.4%
9130
 
3.1%
5121
 
2.9%
3100
 
2.4%
4080
 
1.9%
167
 
1.6%
Other values (24)292
 
6.9%
ValueCountFrequency (%)
02144
50.6%
167
 
1.6%
218
 
0.4%
3100
 
2.4%
49
 
0.2%
5121
 
2.9%
618
 
0.4%
712
 
0.3%
811
 
0.3%
9130
 
3.1%
ValueCountFrequency (%)
701
 
< 0.1%
6011
 
0.3%
506
 
0.1%
453
 
0.1%
4356
 
1.3%
4080
 
1.9%
381
 
< 0.1%
3522
 
0.5%
30217
5.1%
291
 
< 0.1%

BPMeds
Categorical

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0.0
4061 
1.0
 
124
0.02962962962962963
 
53

Length

Max length19
Median length3
Mean length3.200094384
Min length3

Characters and Unicode

Total characters13562
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.04061
95.8%
1.0124
 
2.9%
0.0296296296296296353
 
1.3%

Length

2022-09-08T08:10:12.225875image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-08T08:10:12.502214image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0.04061
95.8%
1.0124
 
2.9%
0.0296296296296296353
 
1.3%

Most occurring characters

ValueCountFrequency (%)
08352
61.6%
.4238
31.2%
2265
 
2.0%
9265
 
2.0%
6265
 
2.0%
1124
 
0.9%
353
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number9324
68.8%
Other Punctuation4238
31.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
08352
89.6%
2265
 
2.8%
9265
 
2.8%
6265
 
2.8%
1124
 
1.3%
353
 
0.6%
Other Punctuation
ValueCountFrequency (%)
.4238
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common13562
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
08352
61.6%
.4238
31.2%
2265
 
2.0%
9265
 
2.0%
6265
 
2.0%
1124
 
0.9%
353
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII13562
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
08352
61.6%
.4238
31.2%
2265
 
2.0%
9265
 
2.0%
6265
 
2.0%
1124
 
0.9%
353
 
0.4%

prevalentStroke
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
4213 
1
 
25

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

Length

2022-09-08T08:10:12.730425image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-08T08:10:12.977789image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

Most occurring characters

ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
04213
99.4%
125
 
0.6%

prevalentHyp
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
2922 
1
1316 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0

Common Values

ValueCountFrequency (%)
02922
68.9%
11316
31.1%

Length

2022-09-08T08:10:13.202666image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-08T08:10:13.454536image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
02922
68.9%
11316
31.1%

Most occurring characters

ValueCountFrequency (%)
02922
68.9%
11316
31.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02922
68.9%
11316
31.1%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02922
68.9%
11316
31.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02922
68.9%
11316
31.1%

diabetes
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
4129 
1
 
109

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

Length

2022-09-08T08:10:13.674675image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-08T08:10:13.925247image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

Most occurring characters

ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
04129
97.4%
1109
 
2.6%

totChol
Real number (ℝ≥0)

Distinct249
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean236.7215855
Minimum107
Maximum696
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2022-09-08T08:10:14.166902image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum107
5-th percentile170
Q1206
median234
Q3262
95-th percentile312
Maximum696
Range589
Interquartile range (IQR)56

Descriptive statistics

Standard deviation44.32645264
Coefficient of variation (CV)0.1872514184
Kurtosis4.216674211
Mean236.7215855
Median Absolute Deviation (MAD)28
Skewness0.8766047676
Sum1003226.079
Variance1964.834404
MonotonicityNot monotonic
2022-09-08T08:10:14.479468image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24085
 
2.0%
22070
 
1.7%
26062
 
1.5%
21061
 
1.4%
23259
 
1.4%
25057
 
1.3%
20056
 
1.3%
22554
 
1.3%
23054
 
1.3%
20553
 
1.3%
Other values (239)3627
85.6%
ValueCountFrequency (%)
1071
< 0.1%
1131
< 0.1%
1191
< 0.1%
1241
< 0.1%
1261
< 0.1%
1291
< 0.1%
1331
< 0.1%
1352
< 0.1%
1371
< 0.1%
1402
< 0.1%
ValueCountFrequency (%)
6961
 
< 0.1%
6001
 
< 0.1%
4641
 
< 0.1%
4531
 
< 0.1%
4391
 
< 0.1%
4321
 
< 0.1%
4103
0.1%
4051
 
< 0.1%
3981
 
< 0.1%
3921
 
< 0.1%

sysBP
Real number (ℝ≥0)

HIGH CORRELATION

Distinct234
Distinct (%)5.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean132.3524068
Minimum83.5
Maximum295
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2022-09-08T08:10:14.820347image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum83.5
5-th percentile104
Q1117
median128
Q3144
95-th percentile175
Maximum295
Range211.5
Interquartile range (IQR)27

Descriptive statistics

Standard deviation22.03809664
Coefficient of variation (CV)0.1665107358
Kurtosis2.155019383
Mean132.3524068
Median Absolute Deviation (MAD)13
Skewness1.145362136
Sum560909.5
Variance485.6777037
MonotonicityNot monotonic
2022-09-08T08:10:15.131729image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
120107
 
2.5%
130102
 
2.4%
11096
 
2.3%
11589
 
2.1%
12588
 
2.1%
12484
 
2.0%
12280
 
1.9%
12673
 
1.7%
12873
 
1.7%
12372
 
1.7%
Other values (224)3374
79.6%
ValueCountFrequency (%)
83.52
 
< 0.1%
851
 
< 0.1%
85.51
 
< 0.1%
902
 
< 0.1%
921
 
< 0.1%
92.52
 
< 0.1%
932
 
< 0.1%
93.52
 
< 0.1%
943
0.1%
957
0.2%
ValueCountFrequency (%)
2951
 
< 0.1%
2481
 
< 0.1%
2441
 
< 0.1%
2431
 
< 0.1%
2351
 
< 0.1%
2321
 
< 0.1%
2301
 
< 0.1%
2202
< 0.1%
2171
 
< 0.1%
2153
0.1%

diaBP
Real number (ℝ≥0)

HIGH CORRELATION

Distinct146
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean82.8934639
Minimum48
Maximum142.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2022-09-08T08:10:15.455859image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum48
5-th percentile66
Q175
median82
Q389.875
95-th percentile104.575
Maximum142.5
Range94.5
Interquartile range (IQR)14.875

Descriptive statistics

Standard deviation11.9108496
Coefficient of variation (CV)0.1436886461
Kurtosis1.277099606
Mean82.8934639
Median Absolute Deviation (MAD)7.5
Skewness0.714102184
Sum351302.5
Variance141.8683382
MonotonicityNot monotonic
2022-09-08T08:10:15.762177image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
80262
 
6.2%
82152
 
3.6%
85137
 
3.2%
70135
 
3.2%
81131
 
3.1%
84122
 
2.9%
90119
 
2.8%
78116
 
2.7%
87113
 
2.7%
75108
 
2.5%
Other values (136)2843
67.1%
ValueCountFrequency (%)
481
 
< 0.1%
501
 
< 0.1%
511
 
< 0.1%
522
 
< 0.1%
531
 
< 0.1%
541
 
< 0.1%
553
0.1%
562
 
< 0.1%
576
0.1%
57.53
0.1%
ValueCountFrequency (%)
142.51
 
< 0.1%
1401
 
< 0.1%
1362
 
< 0.1%
1352
 
< 0.1%
1332
 
< 0.1%
1321
 
< 0.1%
1305
0.1%
1291
 
< 0.1%
1281
 
< 0.1%
127.51
 
< 0.1%

BMI
Real number (ℝ≥0)

Distinct1364
Distinct (%)32.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25.80200758
Minimum15.54
Maximum56.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2022-09-08T08:10:16.107780image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum15.54
5-th percentile20.06
Q123.08
median25.41
Q328.0375
95-th percentile32.7715
Maximum56.8
Range41.26
Interquartile range (IQR)4.9575

Descriptive statistics

Standard deviation4.070952552
Coefficient of variation (CV)0.1577765815
Kurtosis2.682302864
Mean25.80200758
Median Absolute Deviation (MAD)2.49
Skewness0.9841813826
Sum109348.9081
Variance16.57265468
MonotonicityNot monotonic
2022-09-08T08:10:16.411703image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
25.8020075819
 
0.4%
22.9118
 
0.4%
22.5418
 
0.4%
23.4818
 
0.4%
22.1918
 
0.4%
23.0916
 
0.4%
25.0916
 
0.4%
23.113
 
0.3%
22.7313
 
0.3%
25.2313
 
0.3%
Other values (1354)4076
96.2%
ValueCountFrequency (%)
15.541
< 0.1%
15.961
< 0.1%
16.481
< 0.1%
16.592
< 0.1%
16.611
< 0.1%
16.691
< 0.1%
16.711
< 0.1%
16.731
< 0.1%
16.751
< 0.1%
16.871
< 0.1%
ValueCountFrequency (%)
56.81
< 0.1%
51.281
< 0.1%
45.81
< 0.1%
45.791
< 0.1%
44.711
< 0.1%
44.551
< 0.1%
44.271
< 0.1%
44.091
< 0.1%
43.691
< 0.1%
43.671
< 0.1%

heartRate
Real number (ℝ≥0)

Distinct74
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean75.87892377
Minimum44
Maximum143
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2022-09-08T08:10:16.728857image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum44
5-th percentile60
Q168
median75
Q383
95-th percentile98
Maximum143
Range99
Interquartile range (IQR)15

Descriptive statistics

Standard deviation12.02517703
Coefficient of variation (CV)0.158478487
Kurtosis0.9084053865
Mean75.87892377
Median Absolute Deviation (MAD)7
Skewness0.6445577292
Sum321574.8789
Variance144.6048827
MonotonicityNot monotonic
2022-09-08T08:10:17.443464image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
75563
 
13.3%
80385
 
9.1%
70305
 
7.2%
60231
 
5.5%
85227
 
5.4%
72222
 
5.2%
65197
 
4.6%
90172
 
4.1%
68151
 
3.6%
10098
 
2.3%
Other values (64)1687
39.8%
ValueCountFrequency (%)
441
 
< 0.1%
452
 
< 0.1%
461
 
< 0.1%
471
 
< 0.1%
485
 
0.1%
5022
0.5%
511
 
< 0.1%
5217
0.4%
5311
0.3%
5412
0.3%
ValueCountFrequency (%)
1431
 
< 0.1%
1401
 
< 0.1%
1301
 
< 0.1%
1253
 
0.1%
1222
 
< 0.1%
1207
 
0.2%
1155
 
0.1%
1123
 
0.1%
11036
0.8%
1088
 
0.2%

glucose
Real number (ℝ≥0)

HIGH CORRELATION

Distinct144
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean81.96675325
Minimum40
Maximum394
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.2 KiB
2022-09-08T08:10:17.753166image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum40
5-th percentile62
Q172
median80
Q385
95-th percentile107
Maximum394
Range354
Interquartile range (IQR)13

Descriptive statistics

Standard deviation22.83660313
Coefficient of variation (CV)0.2786081213
Kurtosis64.88213732
Mean81.96675325
Median Absolute Deviation (MAD)7
Skewness6.518745919
Sum347375.1003
Variance521.5104424
MonotonicityNot monotonic
2022-09-08T08:10:18.080050image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
81.96675325388
 
9.2%
75193
 
4.6%
77167
 
3.9%
73156
 
3.7%
80152
 
3.6%
70152
 
3.6%
83151
 
3.6%
78148
 
3.5%
74141
 
3.3%
85127
 
3.0%
Other values (134)2463
58.1%
ValueCountFrequency (%)
402
 
< 0.1%
431
 
< 0.1%
442
 
< 0.1%
454
0.1%
473
0.1%
481
 
< 0.1%
503
0.1%
522
 
< 0.1%
535
0.1%
545
0.1%
ValueCountFrequency (%)
3942
< 0.1%
3861
< 0.1%
3701
< 0.1%
3681
< 0.1%
3481
< 0.1%
3321
< 0.1%
3251
< 0.1%
3201
< 0.1%
2971
< 0.1%
2941
< 0.1%

TenYearCHD
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.2 KiB
0
3594 
1
644 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4238
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0

Common Values

ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

Length

2022-09-08T08:10:18.367453image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-08T08:10:18.627507image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

Most occurring characters

ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4238
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

Most occurring scripts

ValueCountFrequency (%)
Common4238
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII4238
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
03594
84.8%
1644
 
15.2%

AgeBucket
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)0.1%
Missing1883
Missing (%)44.4%
Memory size4.5 KiB
(40, 50]
1609 
(30, 40]
746 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters18840
Distinct characters8
Distinct categories5 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row(30, 40]
2nd row(40, 50]
3rd row(40, 50]
4th row(40, 50]
5th row(40, 50]

Common Values

ValueCountFrequency (%)
(40, 50]1609
38.0%
(30, 40]746
 
17.6%
(Missing)1883
44.4%

Length

2022-09-08T08:10:18.855387image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-08T08:10:19.077180image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
402355
50.0%
501609
34.2%
30746
 
15.8%

Most occurring characters

ValueCountFrequency (%)
04710
25.0%
(2355
12.5%
42355
12.5%
,2355
12.5%
2355
12.5%
]2355
12.5%
51609
 
8.5%
3746
 
4.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number9420
50.0%
Open Punctuation2355
 
12.5%
Other Punctuation2355
 
12.5%
Space Separator2355
 
12.5%
Close Punctuation2355
 
12.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04710
50.0%
42355
25.0%
51609
 
17.1%
3746
 
7.9%
Open Punctuation
ValueCountFrequency (%)
(2355
100.0%
Other Punctuation
ValueCountFrequency (%)
,2355
100.0%
Space Separator
ValueCountFrequency (%)
2355
100.0%
Close Punctuation
ValueCountFrequency (%)
]2355
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common18840
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
04710
25.0%
(2355
12.5%
42355
12.5%
,2355
12.5%
2355
12.5%
]2355
12.5%
51609
 
8.5%
3746
 
4.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII18840
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
04710
25.0%
(2355
12.5%
42355
12.5%
,2355
12.5%
2355
12.5%
]2355
12.5%
51609
 
8.5%
3746
 
4.0%

Interactions

2022-09-08T08:10:05.953176image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:50.556175image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:52.815786image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:55.044517image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:57.252061image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:59.342496image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:01.476494image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:03.797844image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:06.241510image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:50.912739image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:53.105666image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:55.334268image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:57.531771image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:59.615469image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:02.032364image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:04.072645image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:06.528362image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:51.200848image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:53.393576image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:55.617391image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:57.802263image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:59.895875image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:02.294160image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:04.362742image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:06.811597image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:51.477848image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:53.677966image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:55.901486image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:58.065424image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:00.186572image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:02.554424image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:04.631363image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:07.078288image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:51.735595image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:53.944529image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:56.166265image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:58.312994image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:00.439971image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:02.794027image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:04.894700image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:07.357267image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:52.000426image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:54.215553image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:56.435107image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:58.562537image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:00.685470image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:03.036592image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:05.153057image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:07.615130image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:52.249119image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:54.476898image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:56.681528image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:58.800818image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:00.935070image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:03.279087image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:05.410589image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:07.893812image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:52.526573image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:54.752179image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:56.964038image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:09:59.069558image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:01.200217image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:03.529382image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-08T08:10:05.670103image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-09-08T08:10:19.302145image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-08T08:10:19.756100image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-08T08:10:20.209912image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-08T08:10:20.626578image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-09-08T08:10:20.983699image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-08T08:10:08.359334image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-08T08:10:09.033100image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-09-08T08:10:09.352466image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

maleagecurrentSmokercigsPerDayBPMedsprevalentStrokeprevalentHypdiabetestotCholsysBPdiaBPBMIheartRateglucoseTenYearCHDAgeBucket
013900.00.0000195.0106.070.026.9780.077.00(30.0, 40.0]
104600.00.0000250.0121.081.028.7395.076.00(40.0, 50.0]
2148120.00.0000245.0127.580.025.3475.070.00(40.0, 50.0]
3061130.00.0010225.0150.095.028.5865.0103.01NaN
4046123.00.0000285.0130.084.023.1085.085.00(40.0, 50.0]
504300.00.0010228.0180.0110.030.3077.099.00(40.0, 50.0]
606300.00.0000205.0138.071.033.1160.085.01NaN
7045120.00.0000313.0100.071.021.6879.078.00(40.0, 50.0]
815200.00.0010260.0141.589.026.3676.079.00NaN
9143130.00.0010225.0162.0107.023.6193.088.00(40.0, 50.0]

Last rows

maleagecurrentSmokercigsPerDayBPMedsprevalentStrokeprevalentHypdiabetestotCholsysBPdiaBPBMIheartRateglucoseTenYearCHDAgeBucket
422805000.00.00000011260.0190.0130.043.6785.0260.0000000(40.0, 50.0]
4229051120.00.00000010251.0140.080.025.6075.081.9667530NaN
423005613.00.00000010268.0170.0102.022.8957.081.9667530NaN
423115800.00.00000010187.0141.081.024.9680.081.0000000NaN
423216800.00.00000010176.0168.097.023.1460.079.0000001NaN
423315011.00.00000010313.0179.092.025.9766.086.0000001(40.0, 50.0]
4234151143.00.00000000207.0126.580.019.7165.068.0000000NaN
4235048120.00.02963000248.0131.072.022.0084.086.0000000(40.0, 50.0]
4236044115.00.00000000210.0126.587.019.1686.081.9667530(40.0, 50.0]
423705200.00.00000000269.0133.583.021.4780.0107.0000000NaN